Multi-Document Summarization Based on Two-Level Sparse Representation Model

نویسندگان

  • He Liu
  • Hongliang Yu
  • Zhi-Hong Deng
چکیده

Multi-document summarization is of great value to many real world applications since it can help people get the main ideas within a short time. In this paper, we tackle the problem of extracting summary sentences from multi-document sets by applying sparse coding techniques and present a novel framework to this challenging problem. Based on the data reconstruction and sentence denoising assumption, we present a two-level sparse representation model to depict the process of multi-document summarization. Three requisite properties is proposed to form an ideal reconstructable summary: Coverage, Sparsity and Diversity. We then formalize the task of multi-document summarization as an optimization problem according to the above properties, and use simulated annealing algorithm to solve it. Extensive experiments on summarization benchmark data sets DUC2006 and DUC2007 show that our proposed model is effective and outperforms the state-of-the-art algorithms.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fusion of Thermal Infrared and Visible Images Based on Multi-scale Transform and Sparse Representation

Due to the differences between the visible and thermal infrared images, combination of these two types of images is essential for better understanding the characteristics of targets and the environment. Thermal infrared images have most importance to distinguish targets from the background based on the radiation differences, which work well in all-weather and day/night conditions also in land s...

متن کامل

Graph-based models for multi-document summarization

University of Ljubljana Faculty of Computer and Information Science Ercan Canhasi Graph-based models for multi-document summarization is thesis is about automatic document summarization, with experimental results on general, query, update and comparative multi-document summarization (MDS). We describe prior work and our own improvements on some important aspects of a summarization system, incl...

متن کامل

Image Classification via Sparse Representation and Subspace Alignment

Image representation is a crucial problem in image processing where there exist many low-level representations of image, i.e., SIFT, HOG and so on. But there is a missing link across low-level and high-level semantic representations. In fact, traditional machine learning approaches, e.g., non-negative matrix factorization, sparse representation and principle component analysis are employed to d...

متن کامل

An Exploration of Document Impact on Graph-Based Multi-Document Summarization

The graph-based ranking algorithm has been recently exploited for multi-document summarization by making only use of the sentence-to-sentence relationships in the documents, under the assumption that all the sentences are indistinguishable. However, given a document set to be summarized, different documents are usually not equally important, and moreover, different sentences in a specific docum...

متن کامل

Semantic Role Frames Graph-based Multidocument Summarization

Multi-document summarization is a process of automatic creation of a compressed version of the given collection of documents. Recently, the graph-based models and ranking algorithms have been extensively researched by the extractive document summarization community. While most work to date focuses on sentence-level relations in this paper we present graph model that emphasizes not only sentence...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015